WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCX 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 7 
G06F 9/44, 9/45 



Al 



(11) International Publication Number: WO 00/22519 

(43) International Publication Date: 20 April 2000 (20.04.00) 



(21) International Application Number: PCT/US99/23919 

(22) International Filing Date: * % 14 October 1999 (14.10.99) 



(30) Priority Data: 

09/173.158 



14 October 1998 (14.10.98) 



US 



(71) Applicant: ALCATEL USA SOURCING, L.P. [US/US]; 1000 

Coit Road, Piano, TX 75075 (US). 

(72) Inventor: TOWNSEND, Arthur, R.; 6532 Blue Ridge Trail, 

Piano, TX 75023-3003 (US). 

(71) Agent: FISH, Charles, S.; Baker & Botts, L.L.P., 200! Ross 
° Avenue, Dallas, TX 75201-2980 (US). 



(81) Designated States: AE, AL, AM, AT, AT (Utility model), AU, 
AZ, BA, BB, BG. BR, BY, CA, CH, CN, CR, CU, CZ, CZ 
(Utility model), DE f DE (Utility model), DK, DK (Utility 
model), EE, EE (Utility model), ES. FI, FI (Utility model). 
GB, GD, GE, GH, GM, HR, HU. ID, IL, IN, IS, JP, KE, 
KG, KP, KR, KZ, LC, LK, LR, LS. LT, LU, LV, MD, MG, 
MK, MN, MW, MX, NO, NZ, PL, PT, RO. RU, SD, SE, 
SG, SI, SK. SK (Utility model), SL, TJ, TM, TR, TT. UA. 
UG, UZ, VN, YU, ZA. ZW, ARIPO patent (GH, GM, KE, 
LS, MW, SD, SL, SZ, TZ, UG, ZW), Eurasian patent (AM, 
AZ, BY, KG, KZ, MD, RU, TJ. TM), European patent (AT, 
BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU. 
MC, NL, PT, SE), OAPI patent (BF, BJ, CF, CG, CI. CM, 
GA, GN, GW, ML, MR, NE, SN; TD, TG). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



Best Available Cop 



(54) Title: ASSEMBLY LANGUAGE TRANSLATOR 



1 



ASSEU8LER 
SOURCE 



Z TO C 
TRANSLATOR 
FLOW 




(57) Abstract 

A computer-implemented method of translating an assembler program into a high-level language computer program is provided. The 
method includes receiving each line of the assembler program, parsing individual fields in each assembler program line, including an absolute 
line number and an opcode for an assembler instruction. Each assembler program line is then stored into a data structure such that each 
line is accessible and each field in each line is accessible, and the numeric opcode of each assembler program line is parsed into individual 
digits. Alternatively, the symbolic opcode and operands may be decoded to produce a more human readable and maintainable output. 
For each assembler program line, a decision tree is traversed in response to the value of each opcode digit to identify the corresponding 
assembler instruction. The assembler instruction is then translated to an equivalent set of cod e in the high-level computer language. The 
equivalent set of code for each assembler program line is generated and provided as output. 
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ASSEMBLY LANGUAGE TRANSLATOR 



TECHNICAL FIELD OF THE INVENTION . 

This invention is related in general to the field of 
computer applications. More particularly, the invention is 
5 related to a translator and method therefor for translating 

from a computer program in assembly language to a computer 
program in a high-level language,' such as C. 

BACKGROUND OF THE INVENTION 

10 Computers can be programmed by applications written in 

many different types of computer languages. The earlier 
computer applications were typically written in assembly 
language, such as Z8000. -'More recently,' the preference in 
programming languages Is high-level languages such as .C and'. 

15 'C++,' which are easier to understand, code, and .-debug 
because of their 'closer resemblance to English.. .. Further,, 
these high-level ' langriages often possess other advantages- 
;'' such"' as portability * : 'to different'., computing platforms y** 
Because a large ,/ volume of complex', application pr6<jram§: 

20 deployed in many in'dus.tries is already written in assembly 

languages, manually rewriting theses programs in high-level', 
languages would be"* extremely costly . Therefore ,- a .. more 
cost effective way ;to ;recode these application programs in- 
high-level languages is desired. \ ; _ 
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NUMMARY OF THE INVENTION 

Accordingly, there is a need for a. translator that is 
capable of translating a program- written in an., assembly 
language fo "a high-level language s-uch as *GV-*-7'. 

> : In accordance with the present invention,, an -assembly 

language to a' : high-level language translator, is provided 
which eliminates or "substantially reduces the -disadvantages 
associated with prior" systems' and methods 7 

In one 'aspect of the iiiventioii , a ebmputer-jLrr.plement ed 

3 method of translating an assembler program .into a high- 

level language computer program is provided 7 7 The , method 
includes receiving each "lihe : o r f the assembler program, 
parsing individual fields in each assembler program line, 
including an absolute line number. -'and _a?rv opcode ; . for an 

5 ' ' assembler "instruction. Each assembler program- line is then 
: ' stored into" : a "data' structure 'such that: each ' line is 
' accessible and each - field in - each • line: is' accessible, and 
the opcode of each assembler program line, is parsed into 
individual digits. • For" each assembler 7 program line, a 

0 decision tre4" is traversed in response to the. value of each 

opcode digit to identify the corresponding assembler 
instruction. The assembler instruction; is. then translated 
to an equivalent set of code in ^the - high-level computer 
languages ' The equivalent set ' of code' for e.ach assembler 

5 program' line is generated and provided as - output . : ; . 

BRIEF DESCRIPTION OF THE DRAWINGS • 

For a better understanding of the. present invention, 
reference may be made to the accompanying drawings, in 
0 which : " ' 

FIGURE l is a top' level block diagram of an- embodiment 
of the- translator of the "present invention; 
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. . FIGURE 2 is .a ..more . detailed block-diagram of an 
• embodiment \of. .the, translator of. the present invention; 

FIGURE •• 3 -is a f lowchart of an embodiment of a 
■ preprocessor. pro.ces-s. flow -of the present invention; 

• FIGURE. 4 .is -a-,, flowchart • .of .an embodiment of a 
: converter process, flow . of .-the- present : invention; 

FIGURE 5. : . is • a , .more,., detailed block, diagram of an 
embodiment of a converter process flow of the present 

invention;, .and •. . - 

. FIGURE 6 is. . a . f lowchart of an embodiment of the 
converter., process, flow of the present invention . 

DETAILED DESCRIPTION OF THE INVENTION 

The. preferred embodiments of the present invention are 
--illustrated in FIGURES 1-6, like 'reference numerals being 
used to refer.: . to like and corresponding, parts of the 

various drawings. 

. . FIGURE 1 is, a., simplified block diagram of a translator 
'■ 10- operable to. translate an . origination computer source 
'-listing •. 12 in an assembly ■ language, such as Z80C0 , to 
' ta-rget'high-level language .intermediate source listings 13. 
-Translator .10 :may .. include a process 15 for address 
Resolution, a preprocessor. .1 6, and a converter 18 . In an 
embodiment of the present invention, : the . target language 
for source listings 13 is C and languages like C. However, 
the translator and method therefor of . the present invention 
are applicable to. - translation to. other high-level 

■ languages . : ■. 

FIGURE 2 is a more detailed block diagram of an 

■ embodiment of the present.-. invention . Assembler source code 
files 20. .are what the . .computer programmers _.o.r . software 
engineers have generated. Source files 20 are then 
supplied to an assembler program 22, which assembles the 
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code and generates two types of output files, listing files 
24 and ROB' ' (relocatable object binary} fi-les^26. . Source 
-files' 20 are as sumed to be " : syntactically * cSrrect; sb as to 
not generate ' compile' errors / Listing files 24. are in a 
5 human readable format and may ' contain relocatable values or 

addresses. There are many assembler programs which are 
capable of performing" this ; function, including" the YASM™ 
assembler program, developed by Unidot of Golden, Colorado. 
: ROB files 26 may be. provided to a linker program 28, such 

10 « as -" YALL™, developed by Unidot of Golden, Colorado, which 
then generates a M,AP listing file 30 and" an assembly 
.-executable file. 32. Linker 28 binds all ROB files 26 
together so .that MAP listing file 30 and* assembly 
executable. file 32 cpntaining variables and" their 

15 . respective absolute memory addresses may be produced. 

Assembly executable file 32 is in machine code format and 
MAP listing file 30 is in a human readable format. Both 
are assumed .to be bug-free and working in the source 
environment. . 

20 .Listing files 24, MA? listing file '30/ and' assembly 

executable file 32 are then provided as input' to translator 
,10. Address resolution process 15 receives listing files 
.24, . MAP listing .file 30, and executable file 32, and 
generates, listing files 34 that contain absolute addresses . 

25- .. For example, the following lines are from an' exemplary 
listing file 24 : 

46 0044 4D05 80000000* 42' LD " ' BUFF+0, #0 

47 ~ 004A "0000 * 1 
3 0 ' " ' ' " ' : : 

Iri Z8000 assembly language, 'opcodes are -represented in four 
"hexadecimal digits. The opcode "4D05" "indicates-vthe load 
or" LD instruction with 'an immediate- value- and direct 
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• .addres s ln.g> j mode . •> . The.. .asterisk ^'^denotes. or . ?±?9 S the 
y-8.Op-OO^;0;^j3 value for j "BUFF" .as a relocatable value which 
:.requlres.; : pesolu i ti>Q : n. : Ths following are the same lines from 

• aa erxemplary .absolute,., address listing file 34: 

0500027c- -4 DOS .8500C97A. 4 6 0044 4D05 80000000* 42 LD BUFF+0, SO 
05000282 0000 47 004A 0000 

It may be seen that the relocatable value hais been 
determined to "have the absolute value of "8500097A" by' 
address resolution "process 15, which obtained this 
information from' MAP listing 30 and assembly executable 
file 32. Therefore, the instruction including absolute 
addresses or values' is added to the front of each line by 
address resolution process 15. Assembly executable ■ file 32 
is a binary executable file "that runs in native mode on a 
Z8000 system. 

The output of address resolution process 15, absolute 
address listing files 34, is then provided as input to 
preprocessor 16." "Preprocessor 1'6 reads each line in 
absolute address listing files "34 into a data structure in 
memory, and generally prepares the listing lines for 
translation by "converter 18. Details'* of preprocessor 16 
are shown in FIGURE 3 and described below. The output of 
preprocessor 16 is fed to converter ' 18 ,' which performs the 
task of converting the listing line-by-line to the high- 
level language,, such as C. The output from converter 18 
are converted C source files 36. C source files 36 are the 
translated program from assembly code. Preferably, C 
source : files .include the original assembly code as comments 
;so: - that;, it ' is; readily . apparent as to t which lines of 
assembly • code, .were - converted to the current lines of C 
code. Typically, C source files 36 have a ".c" file 
extension . 
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Another component of translator 10 is a create RAM 
definitions process ' 38'/"' -"'which 7 gerT^fates.i^:Kfile 4 0 
containing a-ssembly code' and u da'ta> free r memory: "and stack 
space definitions from 'assembly ^executable 32 . : ;„-The output, 

5 C data definition file 40 is the: equivalent- -of assembly, 

executable file 32, but in ■ ASCII j - -format , ' which is 
compatible C code. A compiler 42 then takes C source files 
36 and data definition file 40 'arid- -compiles or transforms 
the source code incb object code stored' in a C. executable 

10 file 44 usually called a. out. 

FIGURE" * 3' is a flowchart" 56"-bf preprocessor 16 
according to an embodiment of the present invention. Some 
initialization is first performed, as 'shown in block 52. 
Initialization may include opening the ' Input and output 

15 " ' files, defining the format for a- valid absolute address.. 

listing line, defining the fields of' a valid absolute 
address listing line, defining"' the assembly instructions 
and the corresponding "opcodes/- and preparing a data 
"structure for' storing ' the listing lines" "read "into zhe 

20 " program. ' In block 54, the inplrt file or absolute-:, address", 
"listing file 34 is read iri line-by-line. 'Each line is then 
split or parsed into individual fields, as shown in block 
56 : . For example,'' thes ; e fields miay : include absolute 
address, absolute data, one or more flags', relative line 

25 number, "relocatable " field, absolute line- number, label, •. 

opcode, operand,' arid comment. The" absolute data field 

contains the hexadecimal equivalent of the opcode and one 

or more operands. An example of -an absolute listing line 

is : ~ 

30 ' ' ' 

''£5000192 2102 O062 42 015ft 2102 0002 130 addr_aata : LD r2, $2 



05.000192" is the absolute address; 
21Q2. 0002" is the absolute data; 
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"42" is the .relative line number; 

"015A 2102 0002" is the potentially relocataSle data; 
"130 "'"is. the absolute line number; 
"ad'dr^data"' is a iabel;- 

" LD" is the opcode; . . : ' 
"R2, #2" :is the -operand. 

The values of these fields are saved as global variables. 
All white spaces between the fields in the lines are 
discarded. Because the absolute address lifting lines are 
in a. fixed and known .format, the fields can.be readily 

parsed.- ; •■- . • ,. ; 

Each assembler listing ..line is further categorized 
into-, at least two types, those containing valid assembly 
instructions and those which do. not. The lines of interest 
are lines that contain a nonblank absolute data field. 
When the absolute data field is nonblank, the absolute data 
field contains an opcode and (possibly) operands that need 
conversion. Therefore, lines containing a blank absolute 
.data field are filtered.. out or discarded, as shown in block 
58 . ■ The ^discarded, lines may be preserved in memory but 
flagged as comment lines .... 

In blocks 60 and 62,. the individual fields are 
recombined into. a line and stored in a. data structure in 
internal memory. An array structure may be used to store 
the- program listing lines. The data structure typically 
does, not physically occupy a. contiguous block of memory. 
The individual- listing, lines stored, in the array may be 
accessible by an index or one or more pointers, for 
example. Additionally, a predetermined character, such as 
"|", may be used to delineate or separate the fields in 
each line. In this manner, each line may be easily 
referenced as well • as - each field within each line. 
Further, each line may be easily taken apart and 
reconstituted with a subset of the fields. 
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Beginning in block 64, each line is further examined 
and grbcessVd. -Where an instruction is " contained; on two 
successive" lines, "such as- 'the 'load' (LD) "instruction:. 

5 ■-" 0500027c AD05 8500097A- 46 : 00.4 4 : ■ 4DG5 80000000*. ,42 LD BUFF+0, #0 
:O^.00Q282 0000 47 004A ;: .0Cp0 . 

.the second line contains an operand, the value zero (0000), 
-which, is to be loaded into the absolute address location 
10 8500097A, After the processing done in block 66, the above 

two, lines become: ......... 

0500.027c 4D05 3500097A 0000 46 0044 4D05 80000000* 42 LD BUFF+0, #0 

15 . It may be seen that the operand value is now inserted into 
the first line. The original two lines may be kept 
proximately to or immediately; before or after the hew line 
" \ but flagged by a special flag so 'that they are "not further 
processed. The instruction, including the " opcode and 

20 ' operand, is now contained on a single line, which 
facilitates the subsequent conversion process'. 

In block 68, labels are separated from opcodes and put 
into two different lines! For example, the original line 
may contain both the opcode and a symbolic label': 

25 . , ..... 

05000192 2102 0002 42 01 5A 2 102 " 0002 130 addrjdata : LD * ~ " r2, #2 

The resultant lines may be: 

30 05000192 42 015A 2102 0002 130 addr_da'ta 

05000192 2102 ^0002" 42 '015A 2102 0002 130 * LD r2, #2 

This process: Is- carried -,out because the label may indicate 
• the beginning of a section of- code .(a function) that is 
35 separate from the instruction, line ■ that it originally 
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resided. .;in, Theref ore, ;; . the^ instructibn _.and ^ the , label 

should Joe -separated into two, different line?. and..gr<?cessed 
independently of one another * : ^. t 

Next; the preprocessor resolves targets of certain 
jump* instructions, as' shown in block 70.'- This step is 
necessary because, the assembler- may generate strange and 
non-unique labels / 'particularly within expansion of certain 
macros. Jump instructions may include JP (jump)- and JR 
(jump relative) instructions. " Fo'r a JP ' instruction, the 
jump address is given in the original line; Therefore, the 
original line is copied and .the jump absolute address, is 
inserted in a field, such as an operand, f or" example . One 
or more characters may be added to the absolute address 
operand to form a symbolic address. For example, the 
15 letters "AH_" may be added to the beginning of the absolute 

jump address. The absolute target is further stored. The 
absolute target address may be stored in an array, 
desired_ab.solute_hex_values, which may be indexed by the 
jump target address. The original line is then' deleted or 
20 maintained and flagged as a comment line.' In block 72, 

during a second pass of the listing lines, the absolute 
address i or . each valid line is compared with the value 
stored in the array. If there is a match, then the 
symbolic label "AH_XXXXXXXX : " , where "XXXXXXXX" is the 
25 absolute address, is inserted immediately before the line 

at this absolute address.. The corresponding absolute 
address value stored in the desired_absolute_hex_values 
array is then .deleted . 

For a JR instruction, a given operand is the value 
30 of the displacement or offset to the intended' jump target. 

* Therefore'/ the absolute address of the; target may be easily 
-determined" and stored" in , :a desiredjrelati,v,e_hex_values 
- array, for"- example . The- original line is.- deleted or> 
maintained and flagged as a comment line. In block 72, 
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during a second pass of the listing lines, the absolute 
address for each valid " :r line is ; compared- * with -the;.- value 
stored in * the array : r:: I'f tfiere- i : s * : a ":^matchy -,i£hen the 
symbolic -label' "RH^XXXXXXXX-i where* • "XXXXXXXX"; is the 
absolute address/ is inserted- irirffined lately before the line 
at this absolute address". ' The corresponding - absolute- 
address value stored * "in "'-the '' desired_rela'tive_hex_values 
array is then deleted'. " ' : * *• • 

"Finally in block 7 4 : , ■ t irfe * ' re s u-1 e a n L ■ listings lines 
stored in the data structure are provided as output. At 
the' same time each line is provided as "output , where a 
label on a li'ne*begins a • C ■ 'f unction, this is. .so indicated 
in the output.' These labels : may be special- comments of a 
known format in the origination source listing' which .begins 
a function or procedure'.- "- "A special -"-flag or comment of a 
known format' may be inserted' i : nto the -line to indicate 
these starting points. " r "> 

In block 78y any inconsistencies are either flagged 
or 'reported. One item' that this^process checks: may- be that 
there should be at- least one";- procedure or .function. 
Another" 'error condition is that* 'ea<;lv function .should end. 
wi : th either a' RET (return), JP, or JR,-and that there be at 
least one' R£T within the body of the function.. This 
ensures that 'there is a valid return--' from, each function. 
Another error condition checked by the process includes 
examining the : desiredJabsolute_hex_values : ;and the 
desired__rela'tiv r e_hex_values arrays to ensure =that both are 
empty.' This 'ensures that} labels for jump targets have been 
properly inserted : "fbr - all *- j'-ump'- '-instructions . The 
preprocessor ' ends ' in ' block' *' : 80 .• -• 

After" preprocessing,- the conversion-process begins. 
: A'n embodiment of the ■ conversion process 90 is shown in 
FIGURE 4. ' Converter 18' receives, as -input , : the. output from 
preprocessor ~16 : , and examines and processes each .line, as 
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* shown. in -block,^2^ c For^ field 
iridicatlv^iof^.the^opcQcJe^ operand,,., and mode r of operation is 
split i.int^. Iour >exadec^ sjN>wn in block 94. 

The .first, digit- is, ^examined and f hen the second, as shown 
-in blocik 9 .6, • -$he values of .the last two digits are then 
.. considered in the. context of .the- first two digits, as shown 
in block 98. This process is similar to using a decision 
tree,, where the' value of the first digit causes a branching 
. to;.:.a". number of' branches-, .the, value, of /.^he ..second digits 
. , causes- further branching in those branches, and. so on. The 
; ; ■ equivalent £ . functionality is then, determined and provided 
: a sv output;,; a,s. shown .in block 100. The process ends in 

' .■ V vv block-; 1Q2 ; . . . - -• ■ - . , • 

Referring to FIGURE 5„ a block diagram, illustrating 
15. : this process is shown. -The first hexadecimal, digit is 
examined to determine what it is,, as shown in block 110. 
The. first digit ; value 112 may range from 0-F. Depending 
• upon, what the f irst , digit value, is, the second hexadecimal 

. :• .digitus then examined to determine its value, as, shown in 
! block- 114. The second digit value 116 may. also range from 
0-F.- Upon -determination of, the first and second digit 
hexadecimal values, ,then a determination of . the^ third and 
. .. fourth digit ; values is made, .as shown in block .118. in 
' many- cases,. the. fourth;., digit specifies a particular 
register, on- which - the. v first three digits pf the opcode 
:. perform a part icular operation . For .example, tjie. opcode 
:.-",G0Q4 ! ' is M ADDB -RH4, . v>/\ with . immediate source addressing 
' mode, • which adds : ; the ; immediate value : .specif ied .in the 
operand to the value in -byte register . number 4. 

- Alternatively, symbolic opcode ..decoding or a 
.• .combination of ;. decoding the. -numeric .opcode, and ^e. symbolic 
.. ■. opcode may ■; be -used : to v determine .what : the... assembly 
: . instruction is. . Numeric ..opcode:, and operand decocling alone 
produces accurate equivalent C code that is portable to 
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different platforms. However, such code may not be human 
readable ''6± : 'maintainabli. Thererf ore/^wrien-the ^.qnly;.goal is 
to 'convert' assembly code to'- poftabte— C - code.^vnumeric 
decoding -is all that is ' required- to- -produce . apcurate 
equivalent C functionality. On -the other hand, - if human 
intervention '-after -conversion is -^desirable, sucji : as to ; 
maintain* the C code, symbolic" opcode and - operand .decoding 
may be necessary' to enhance - the human readability of the 
converted C code. • : .- 

Therefore, instead^ of or along with parsing the 
numeric opcode and operand, the symbolic iqpcode and. operand., 
are." parsed - and examined. Referring ■ to FIGURE ,6.,. - another 
embodiment of a Z to C converter process 130 is, shown. 
Each line is examined and processed, as shown in block 132. 
The opcode character string is examined to -determine the 
assembly instruction specified in each line, as shown in 
block 134. In general, the opcode is a two to four 
character string. Upon determining the opcode, the 
operands are examined, a shown in block 135. Depending on 
the value of the opcode character string, there may be one: 
or two operands. Typically, if there is only one operand, 
that operand is either the source or destination. If there 
are two operands, the format is typically the destination 
operand followed by the source operand, with a comma 
separating the two operands. In block 136, the equivalent 
C functionality is then determined. The following examples 
provide a discussion of both decoding methods. 

For example in implementation, a series of if-then- 
else statements conditioned on the value of the first 
numeric digit or one or more opcode characters or a 
character string may be used to branch, execution into a 
number of code sections, such as subroutines or procedures. 
In each code section, another series of if-then-else 
statements conditioned on the value of the second digit or 
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subsequent opcode v.and ^op^rand.^h^rac^ers, . may^^e .used to 
farther - : 'differentiate; ,the .groups of opcodes. 
Alternatively,:- 1 multiple: branch instructions, such as a CASE 
statement, • may " be used to. conditionally branch execution 
bashed on the' values of .the. opcode digits and characters. 

As discussed . above, the- values of the . first two 
hexadecimal--opcode-dlg.its.-may-.be used, to quickly narrow 
down, the set of possible assembler instructions. In a 
preferred ' embodiment, associative arrays, may be used to. 
further f acilitate.^this translation' process.. For example, 
an -assctiative array may^ be def ined in perl script in this 
manner : 



15 



%RX=( • 

•0', 'RO', 

'1\ 'Rl 1 , 

'2'/ f R2', 

' 3 ' v ' R3 1 , 



2:0: 



1 RIO 1 , 



'R15' )'; 



10 



15 



20 



t 
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An exemplary listing line with the numeric opcode "0105" 
and symbolic opcode ; V A'DD *R5y#6, such - 'as : /o .^t^x-r^- :*r 



46 



01-5C 0105 : 0006 153 • v ADD R5,#6 



may be translated .using> the associative array definition 
above by the following exemplary perl script in the 
subroutine executed when the first hexadecimal opcode digit 

is a "0" : ■ . ' : * '., • '■>:■: 

i'f ($second_he:<_dici't. eq IV) { . ; ' . . . 
if J$thirci_hex_di.gfit eq., 1 0',){ . 

print , :$RX{$fourth_hex_digit add_word ($RX { $f ourth_hex_digit } , 
"0xShex_operands [01 ) ; \n" ; } ^ 

Therefore the resultant C cede from numeric opcode decoding 
may be: 



R5 = add word(R5, 0x0006) ; 



where "add_word ( ) " Is a macro that maybe expanded- by a C 
preprocessor. Alternatively, a library of run time C 
functions may be provided for execution with the converted 
C .program/ These C functions in the run time library 

25 simulate the functionality of certain known assembler 

instructions .or chunks, ^of code, such as the add_word ( ) 
function to : perform the instruction ADD . 

When, .symbolic, decoding is performed, the character 
string. "ADD. R5,#6" .is examined and parsed. It is 

30 determined that the instruction is ADD, and that the 

operands indicate that the operation is to be performed on 
register 5, or R5, involving the immediate value, 6. So 
the.' resultant C- code may be: , 



35 



R5 = R5 +* 6; 
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or. taking. advantage^qf ^ . : , . , - > . : 

R5 6; 

; 5 -'- as an additional example, consider ■ the following 

line : 

0500028812142 I 154 1 0050 2142 I 1501 I LD |R2,@RR4 i 

10 The first two digits "21" in the. numeric opcode show that 

this is a LD (load register) instruction with indirect 
register addressing mode. 'The third digit "4" means source 
register 4 ( RR4 ) and the final fourth digit "2" means 
' destination register 2 <R2) . Within the Z8000 processor, 

15 register RR4 (32 bits) is a concatenation of registers R4 

and R5 (each 16 bits) . In segmented mode, RR4 contains a 
32 bit address. The most significant byte in R4 contains 
the segment number, and R5 contains the offset within that 
.. segment. When converted to C after numeric opcode decoding 

20 the following two lines- are printed: 

/*. 050002.8812142 . I 15410050 2142 I I 50 1 I I LD |R2 f @RR4 ! */ 
R2 = MEMWS ( SEG_BYTE4 , R5) ; 

25 ; ' The first line' is the assembler listing line, which becomes 
simply a comment in the C code produced" The second line 
is the actual C code that' ultimately will be compiled and 
executed on the target machine. : Each part of this line of 
C code is described as* follows:' 

30 

R2 :' the destination register- - 
MEMWS : this is a macro for "memory ward segmented". 
SEG BYTE4 : the first of two arguments . to the above macro. 
This is the byte within R4 containing the segment number. 
35 R5 : the offset within the source segment. 
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Therefore, the entire line of C code means "load into R2 
the 16-bit value In the* Segment specified -by -SEG_^BYTE4 , 
with offset In R"5"-. ' In fact/ each" of the above items is a 
* macro. After expansion : by the- : C- preprocessor the above 
5 line of code becomes-: : - '* — \ : 

(reg. words [2] ) = (* { (unsigned short *>(&(( (unsigned char *) (seg_pcr[ ( { {* 
((unsigned char *)"( &"( reg. longs [2] )))) &.0*7f) ) &Qx7f] ))[ (.(reg. words [ 5] ) ) 

&0xfffe] ) ) ) ) ; ' ■ 

10 

However, by 1 using " symbolic decoding -"to. achieve- a more 
' natural wdy to 'express this functionality, the, output in C 

is : 

15 . R2=* ( (short * ) RR4 ) ; 

In this form of expression, the value in.RR4 is a 32-bit 
address in .a linear addressing space . While accurate, this 
example illustrates that in order to achieve human 
20 " " maintainable. C code, the concept .of .segmentation is. lost in 
the conversion process. _ . . . , 

Consider -the following line:, 

25 050007c8|4C05 85000EC3 55 55 I 1451 0590 4C05 80000005* 1 I 428 I 1 LDB I BUFF+5, #%55 I 

The opcode "4C05" shows that this is a LDB (load byte) 
instruction with -direct.,- addressing mode,, and an , immediate 
' value- The first -operand "85GQ0ECB" .is an absolute 
30-. segmented address.' .-The. second operand "55" is th.e 
. immediate- value (written- as "#%55" in the instruction) . 
The third operand .(the second- ! '55") : is ignored. 
■ - . The segmented -address is split as "85 00 0ECB" . 

The : first .two digits,, "85", . represent . segment number 5. 
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Because the .most, significant bit is. set, this represents a 
lon : g .farm" (32 bit), segmented:, address,. A rshort . form (16 
bit) . segmented address , would t be .. "05CB" . The "00" is 
ignored. The "0EC3" is ' the ' of f set . within the segment. 
When converted to C, the following two lines are generated: 

/*050O073c!4C0S 8S000EC& <55 55 1 I 3 1.0504 . 4C0S 8000000S* I I 393 t,l LDBI BUFF+5, *%SS I* / 
MEMBS(0x35 & 0x7f, OxOECB) = 0x55; 

The second of the above two lines is the actual. C code that 
ultimately will be- compiled and executed on. the target 
machine'. Each part of this line of' C code is described as 
follows : 

MEMBS : this is a macro for "memory byte segmented" . 

i 

0x85 Sl 0x7f : the first of two arguments to the above macro. 
This is resolved by the compiler to* "5~" ' meaning segment 5. 

OxOECB : the offset within the' target segment.' • • 

This line of ' C code means "load the value 0x55 into memory 
location segment 5, offset OxOECB." 

After expansion- by -the C preprocessor the above line of 
code becomes: 

( { (.unsigned char-) iseg_pcr [ <0x85&0x7f ) &0x7f ] ) ) [ (unsigned short) (OxOECB) ] )=0x55; 

Using symbolic opcode- decoding, the-' "BUFF*5" is the 
destination operand, which .". can be written as "BUFF[5]". 
The conversion process infers that the address of BUFF, in 
the'Z8000 is" segment 5/ offset OxOEC6.'-' : Therefore: this may 
be written • as: : BUFi?[5] =0x55; -while -'accurate, this 
conversion process also : does ? away- -with the concept of 
segmentation and BUFF becomes ah - address within, a 32-bit 
linear addressing space. 



r, 

PCT/US99/23919 
18 

For a final example consider the following line: 

0 500042c I 5C31 02QB 8500097FI | 53 I 01F4: 5C31 . 02013 I 1 107 | | "( LDM I R2 , BUFF+5 (R3) , #12 I 

5 The first, second and fourth digits, "5C_l n , in the opcode 

show that this is a LDM (load multiple) instruction with 
indexed addressing mode and an immediate "value representing 
the number of registers. to be moved. ^The third digit in 
the numeric opcode, "3", is the source index register R3 . 

10 The. first operand is a four nibble value "020B" . The "2" 

in the operand represent s the destination register R2 . The 
digit "B" (decimal 11) represents the count ahd is one less 
than the immediate value supplied (12 written as #12). The 
second operand is a 32 bit segmented address similar in 

15 format to the segmented address described above. 

The segmented address is split as "B5 00 097F". The 
first two digits, "85", represent segment number 5. 
Because the most significant bit is set, this represents a 
long form (32 bit) segmented address. The "00" is ignored. 

20 The" "097F" is the offset within, th^. segment. When 

converted to C, the following two lines are printed: 

/*05O0042c!5C31 020B 8500097F I I 53 | 01F4 SC31 020BI I 107 | | I LDM I R2, BUFF+5 (R3) , #12 I */ 
LDMXS(R2, 0x85 & 0x7f, Ox097F, R3, 11) 

25 The second 'of the above two lines is the actual- C code that 

ultimately wi : ll be compiled and executed: on .the target 
machine. This ; C code means '"load 12 registers, beginning 
with R2 from' m'emory location segment 5, offset 0x097F, 
indexed by' R3" . Each -' part of this line of C code is 

3 0 " described ars follows: 

: - LDMXS : this is- a macro for "load multiple indexed segmented". 

• R2 :. the destination . register R2 . 
. ,. . 0x85 & 0x7f : the second of five arguments to the above macro. 

35 . . ..This is .resolved by the compiler to '5', meaning the segment number 5. 
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0x097 F : the offset within the source segment. 
,. R3 .: X^e source index register. 
11 : (the count - 1) 

5,-.. After expansion by the C preprocessor the above line of 
code may become: 
{ , 

unsigned int dst' = (unsigned char * ) & ( reg . words [2} ) - (unsigned 
char *)& (reg. words [0]') ; 
IG ' .'""" ! * uns : ib[n2ci int count, =:11 _+ 1;, • ■ . ' 

; unsigned int src - (0x097F + ( reg. words [3] ) ); 
for (; count j count--) ( 

*( (unsigned' short '* )'&reg . chars '[dst/} } : = (*''( (unsigned short 
*)"(&(( (unsigned char * ) (seg_ptr [ (0x85' & 0x7f) & 0x7 f ] ) ) [ ( src } & 
15 " : ' Oxfffe] > ) ) ) ; " ■ "' - - - - - 

• ' • dst += 2; ^ .... 1: 
. ■ ■ dst & = Oxle; - , .-. 

■■: . src += 2 ; 



20 



A more natural way to express the above C code is: 
LDMX(R2, BUFF[5], R3, 12); 



25 

-Note that ..LDMXS becomes LQMX because the. symbolic opcode 
conversion also does away with segmentation.. "BUFF" 
. becomes i -xmer- address ' within a .32-bit . linear addressing 
spacer .What is not. immediately apparent is that this code 

30 'may not produce the correct functionality in the target 
environment. Because "BUFF" is- a linear, address within a 
linear addressing space, the concept of segmentation does 
not exist. There, is . : no- indication . in the above code 
whether it is important . to consider wraparound within the 

35 segment, or whether' R3 : should be' considered signed or 

unsigned. In this situation, numeric opcode decoding is 
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required to produce accurate code, or this line of code 
should be flagged for human intervention /.after- conversion. 

It* may be/seen- numeric opcode decoding is necessary 
in '"certain" situations. . Therefore-, .a .qombin action of 
5 symbolic and numeric opcode decoding ^nay be necessary to 

' produce accurate human-readable./ §nd : maintainable code. , 
Accordingly, each ; ^v.aLiri... .line- -that contains an 
assembler instruction is pr.acessecL and translated.- in this 
manner. The final output may include the absolute, listing 
10 lines as comments, the G ..equivalent sections of code for 

each instruction line, and comments tftat were part _ of the 
original assembler program. - ■ To make , the resultant C 
program' easier to read, certain, .comment lines and 
'extraneous'- text : may be deleted from, the final output. 
15 ; Although 'specific examples were given in Z8000 assembler 
language and C, the^ methodology of the,- present invention 
: may ! be adapted to-most,, if not; all*.. assembler languages and 
high-level computer languages . * 

It .may be noted that the delineation between the 
20' preprocessor process and; the.- conversion process is merely 

artificial and serves to better highlight the t functional 
aspects of the processes. The two processes may be easily 
merged into one seamless: process, which performs multiple 
passes over' the-- listing- lines stored- in a data .structure in 
25 internal 'memory. . . : - ; 

Although several embodiments of the present 
invention and its- advantages have .been described in detail, 
it should be understood-., that • mutations, changes, 
substitutions, transformations, ..modifications, variations, 
30 and alterations can be made- therein withpu.t departing from 

the teachings of the present invention, the spirit and 
scope of the invention being set forth by the appended 
claims. 
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What - is^^t.aimed- .is : •■ • ' i ■ ...i ...... 

■ ■r.. : : . -i- • :: A r C c^put.e**implemeated method of translating an 
assembler program ' into , a ...high-level, language . computer 

"'program," comprising:- - : 

receiving, each" -line, of the. assembler program; 
parsing individual fields in each assembler program 
line, ' including a ! m absolute xline ; nuinber . and an opcode for 
an assembler instruction;. 
: • - storing each assembler, program- line into a data 
structure such that each line is accessible and each field 
in'each line is'- accessible; ' ■ 

' examining the opcode of each assembler program line; 
• - ■■■ for each assembler -program, line, traversing a 
decision tree- in response to value of opcode to identify 
the corresponding assembler instruction;. 

' -translating the : assembler • instruction . to an 
equivalent set of code in the -high-level computer language; 

■ and ■ ■ 

' ' ' cutputting the equivalent set of code for each 
assembler -program line. , . 

-- 2. The method, as ..set forth in Claim. 1 , wherein 
• opcode examining comprises parsing, a numeric opcode of each 
assembler program line into individual digits. 

: : 3 . • . The method, as set.- forth in Claim 1, wherein 
opcode examining' comprises :r ; • 

examining a symbolic opcode; and. 
examining --one or. two symbolic operands. 
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4. The method, as set forth in claim 1, further 
comprising: 

resolving a target address for a jump opcode; and 
inserting' a "unique " target label at the target 
address . " ' 

5. ' The method; as" let 'forth in claim 1, further 
comprising identifying a label indicative of the start of 

' a function . 

6. ThV method^ as set forth in claim 1", further 
comprising : 

identifying an aissembier instruction contained on 
more than one line; and ■ ' 

: combining data : contained in' the more than one line 
of the assembler instruction into one line. 

'7. The method, " -as "■ set forth .in claim 6, further 
comprising discarding certain repetitive data ■ contained in 
the more than one line of the assembler instruction. 

8. The method, as set forth in claim 1,- further 
comprising: ' - 

identifying an assembler program line containing an 
opcode and a label; and - - 

generating' a firs t- assembler program line containing 
the label "and a * second ' assembler"' program ' line containing 
the opcode. 
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9. The method, as set forth in claim 1, further 

comprising': t . ..-.■'.:.» 

-discarding , space .characters., and other extraneous 

characters after the parsing step; and 

recombining the parsed fields of each assembler 
program line -into an assembler program line. 

10. The method, as set forth in claim 1, further 
comprising inserting the equivalent set of code into the 
data structure proximately to the assembler program line. 

11. The method, a? set forth in claim 1, further 

comprising: . - 

receiving a listing generated by an assembler; 
receiving a MAP listing and an assembly executable 

file from a linker; and 

adding the absolute, address of a listing line 

contained in .-the MAP listing, to. each listing line. 

12. The method, as set forth in claim 1, further 
..comprising : 

receiving a listing generated by an assembler; 

receiving -a MAP listing and. .an assembly executable 
file from a linker; and 

padding the absolute value of a relocatable operand 
contained in the MAP listing to each lasting line. 

13. The method, as set forth in claim 1, further 
comprising creating RAM definitions of data definitions and 
included data . 

14. The method, as set forth in claim 1, wherein 
the storing step includes storing the assembler program 
lines into an array. 
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15. The method/ as set forth in claim 1, wherein 
the decision traversing step- comprises : 

for each assembler 'program line:.. 

determining the value of - a first opcode .digit; 

determining the value of a second opcode digit 
in- Response to the' value- '-orf - the first- opcode digit;. 

' 'determining the vilue .of .a third opcode digit 
in response to the values of the.: first and second 
opcode' digits; '<inu . ' " 

determining the value of a fourth,. opcode digit 
and therefore the assembler instruction in response 
to the values -of : the first, .second and third opcode 
digits . 

16. The method/ as set forth in claim 1, further 



comprising : 

noting ail function entry, points and exit points; 

flagging any inconsistencies in the number of entry 
20 points and exit points. ■ 



WO 00/22519 



PCIYUS99/239I9 



: . . . . . , 25 

17. A computer-implemented method of translating an 
assembler * program into -...a . high-level language computer 
■pr-o'g'ram, comprising.:: -.1; 

receiving- .-each -line; of the- assembler program; 
parsing individual fields in each assembler program 
line, including., an, .absolute line number and a numeric 
opcode- for* an assembler: instruction; 

storing each assembler program line into a data 
structure J such r that each line- is accessible and each field 
"in each line is accessible; 

•parsing the numeric opcode of each assembler program 
line into individual digits; 

for each assembler program line: 

determining the . value of a first numeric 
opcode digit; ' 
■ -determining the- -value of a second numeric 
opcode digit in response to the value of the first 
.numeric opcode digit; 

determining the value of a third numeric opcode, 
digit in response to the values of the first and 
second numeric, opcode digits; 

determining the value of a fourth numeric 
opcode digit and therefore the assembler instruction 
in response to the values of the first, second and 
third numeric opcode digits; and 

translating the assembler instruction to an 
equivalent set of code in the high-level computer 
language; 

outputting the equivalent set of code for each 
assembler program line. 
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18. The method, as set forth in claim 17, further 
comprising: 

resolving a target address for a jump -opcode; and 
' inserting a unique 'target label at the target 
address. 

iSi." The method; as set forth in r claim v 17-, - further 
comprising identifying a label indicative of the start of 

a 

function . 

20. The method, as set forth in claim- 17, further: 
comprising : 

' identifying an assembler Instruction contained on 
more than one line; and 

combining : data contained' in the mere than one line 
of the assembler instruction Into one -line.' 

21. The "method, ' as set' forth iri claim 20, further 
comprising discarding certain repetitive- data contained in 
the more than one line of the assembler instruction. 

22. The method, as set forth in claim "17, further 
comprising: 

"'■ identifying an assembler program line containing an 

opcode and a label; and 

generating a first assembler program-line containing 
the label and a second assembler program * line - containing 
the opcode . 



WO 00/22519 



PCTAJS99/23919 



27 

23. The method, as set forth in claim 17, further 
comprising*: - . . . 

discarding -.space .characters and other extraneous 
characters after the parsing step; and" 

recombining the parsed fields of each assembler 
program -line into an assembler program line. 

24. The method, as set forth in claim 17, further 
comprising inserting the equivalent set of code into the 
data structure proximately to the assembler program line. 

25. The., method, as set forth in claim 17, further 
comprising: 

■ receiving a listing generated by an assembler; 
receiving a MP-.? listing, and an assembly executable 

file from a linker; and 

■ adding the absolute address of a listing line 
contained in the. MA? listing to each listing line. 

26. The method, as set forth in claim 17, further 
•comprising: 

receiving a listing generated by an assembler; 
■ •„■,. .. receiving, .a HA? listing and an assembly executable 
file from a linker; and - 

adding., the. absolute value of .a relocatable operand 
contained in the MA? listing to each listing line. 

27. The method, as set forth in claim 17, further 
comprising creating RAM definitions of data definitions and 
included data. 

28. The method, as set forth in claim 17, wherein 
the storing step includes storing the assembler program 
lines into an array. 
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29. A computer-implemented method of translating an 
assembler prog-ram' into a "high-level language computer 
.program, comprising": ' 

receiving each' line' of the-' assembler program; 

parsing individual fields in each assembler program 
line, including an absolute : l : i : ne number and a symbolic 
opcode and at least one operand for an -assembler 
instruction; 

storing each assembler program lino into a cars 
structure such that each line is accessible and each field 
in each- line is accessible; 

examining the symbolic opcode and -determining the 
opcode instruction; 

"examine at least one operand; ■■ and "' 

translating the assembler instruction to an 
equivalent set of code in the high-level computer language. 

; 30. The method/' as set forth- in claim 29, further 

comprising: ' ' * 

resolving a target address for a jump opcode; and 
inserting a unique target label at the target 

address . 

31. The method, as set forth in claim 29, further 
comprising identifying a" label indicative of the start of 
a function. 

32. ' The method, as set forth in claim 29, further 
comprising : 

identifying an assembler instruction contained on 
more than one line; and 

combining data contained in the more than one line 
of the assembler instruction into one line. 
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33. The method, as set forth in claim 32, further 
comprising discarding certain repetitive data contained in 
the more^than; one line of . the assembler instruction. 

.. 34.. .-The method,, as set forth in claim 29, further 
pomp rising-: 

identifying an assembler program line containing an 
opcode and a label; and 

... ■ . * ...generating a first assembler program line containing 
the label and a second . assembler program line containing 
the .opcode - ■■ : :■ ■ 

35. -The method, as. set forth in. claim 29, further 
comprising : ■ 

discarding space- characters and other extraneous 
characters after the parsing step; and 

recombining the parsed fields of each assembler 
program line into an assembler program line. 

36. - The- method,, as set forth in claim 29, further 
comprising inserting the equivalent set of code into the 
data structure proximately to the assembler program line. 

■ = 37... ■ The .method, as set forth in claim 29, further 
comprising: 

receiving a listing generated by an assembler; 
i receiving a MAP .-listing and an assembly executable 
file from a linker; and 

adding - the absolute address, of a listing line 
contained in the MAP listing to each listing line. 
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3-8 . The method, as set forth in claim 29, further 

.comprising:" ■ . .: .. ... 

i / receiving a lasting: generated by an .assembler ; 

'*.',:" receiving a MAP- 'listing cjnd an aSsembrj/J executable 

5 file from a "linker; and - ; - - . ; 

adding the absolute value of a relocatable operand 
contained in the MAP listing to each listing line. 

39. The method, as set forth in claim 29, further 
10 comprising creating RAM definitions of data definitions- and. 

include data/ . " - " V r 

v 

40. -The method, as set~£,oith In"clayri 2 9, wherein 
ithe-, storing-., step includes storing- 'the- -assembler program 

15 linejsL into <h array. - ' ' - - 
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